Large contingency tables arise in many contexts but especially in thecollection of survey and census data by government statistical agencies.Because the vast majority of the variables in this context have a large numberof categories, agencies and users need a systematic way of constructing tableswhich are summaries of such contingency tables. We propose such an approach inthis paper by finding members of a class of restricted log-linear models whichmaximize the likelihood of the data and use this to find a parsimonious meansof representing the table. In contrast with more standard approaches for modelsearch in hierarchical log-linear models (HLLM), our procedure systematicallyreduces the number of categories of the variables. Through a series ofexamples, we illustrate the extent to which it can preserve the interactionstructure found with HLLMs and be used as a data simplification procedure priorto HLL modeling. A feature of the procedure is that it can easily be applied tomany tables with millions of cells, providing a new way of summarizing largedata sets in many disciplines. The focus is on information and descriptionrather than statistical testing. The procedure may treat each variable in thetable in different ways, preserving full detail, treating it as fully nominal,or preserving ordinality.
展开▼